Ashley Jones
Activity 12: Spreadsheets
My knowledge about spreadsheets is minimal to say the least. But I was eager to begin my investigations of spreadsheets using data that was interesting to me. This is a great way to get students involved and motivated by having them use data statistics that appeal to their interests. Since I am off at college and on my own for the most part, I am very interested in the data focused on gas prices (unleaded). It is not always a pleasant thing to drive to the gas station and fill up. For this investigation I am using data found from the U.S. Bureau of Labor Statistics ( http://data.bls.gov/cgi-bin/surveymost). While there is lots of data available for this intriguing subject, I used the data from 2009 (January to December).
Here is a view of my spread sheet as I entered the month in column one and the gas price in column two.
For this particular set of data I wanted to explore the function aspect of spreadsheets. Therefore, I began by plotting the data on an xy-scatterplot before finding best fit equations. Below is the scatter plot I created by using the inputted data.
Rather than the x-axis being labeled by the actual month name, we are using the number of the month. So, January is represented by 1, February by 2, and so on. It is apparent that for the year of 2009 gas prices were much higher the second half of the year. However, I am unsure by looking at the data points what type of function would best fit the data. To figure out which function would be best, I am going to compare the different lines of best fit and their squared error values. Which ever best fit function has the squared error closest to the value of 1 will be the best fit function for the data. We square the error value of the function to the data points so as to always be able to compare positive values. This makes our comparison process a little bit easier.
For the data in our scatter plot, I first found the linear line of best fit. By highlighting the scatter plot and its data points I was able to quickly add the linear line of best fit to the scatter plot, including the function and its squared error value. Below is the scatter plot with the linear function of best fit included. As you can see, the squared error value about 0.809 which shows that there is a strong correlation between the data and linear line of best fit.
Next for the data in the scatter plot, I found the quadratic line of best fit. Through the same process I was able to find the equation and squared error value of this best fit quadratic function. While it seems as though the data will not show much of a correlation to a quadratic function, I still wanted to investigate. Below is the scatter plot with the quadratic function of best fit included. As you can see, I was very wrong about the data not showing a correlation to a quadratic function. The squared error to the best fit quadratic equation is approximately 0.9197, which is even closer to 1, showing a greater correlation to the data than the linear line of best fit.
The remaining lines of best fit that I tested were as follows: Logarithmic, exponential, and power. Below I have labeled the squared error values for each line of best fit.
Logarithmic line of best fit: (squared error value = 0.87038)
Exponential line of best fit: (squared error value = 0.80612)
Power line of best fit: (squared error value = 0.88605)
At the end of investigating all available lines of best fit for the given data on gas prices throughout 2009, I found the quadratic model to be the most accurate. The squared error value of this particular model was closest to 1, thus yielding the best function for line of best fit. Below I have added my entire Excel file for your further exploration.